OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection
نویسندگان
چکیده
Monocular 3D object detection has recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery. Yet, such two-stage methods typically suffer from overfitting and are incapable explicitly encapsulating geometric relation between bounding box. To overcome this limitation, we instead propose jointly estimate dense scene with depth-bounding box residuals boxes, allowing two-stream objects that harnesses both geometry context information. Thereby, stream combines visible recover via explicit occlusion-aware optimization. In addition, based projection scheme is employed in an effort enhance distance perception. The second stream, named as Context Stream, directly regresses location size. This novel representation enables us enforce cross-stream consistency terms, which aligns outputs streams, further improves overall performance. Extensive experiments on public benchmark demonstrate OPA-3D outperforms state-of-the-art main Car category, whilst keeping real-time inference speed.
منابع مشابه
Explicit Occlusion Reasoning for 3D Object Detection
Consider the problem of recognizing an object that is partially occluded in an image. The visible portions are likely to match learned appearance models for the object, but hidden portions will not. The (hypothetical) ideal system would consider only the visible object information, correctly ignoring all occluded regions. In purely 2D recognition, this requires inferring the occlusion present, ...
متن کاملMonocular Object Detection Using 3D Geometric Primitives
Multiview object detection methods achieve robustness in adverse imaging conditions by exploiting projective consistency across views. In this paper, we present an algorithm that achieves performance comparable to multiview methods from a single camera by employing geometric primitives as proxies for the true 3D shape of objects, such as pedestrians or vehicles. Our key insight is that for a ca...
متن کاملPixel-wise object tracking
In this paper, we propose a novel pixel-wise visual object tracking framework that can track any anonymous object in a noisy background. The framework consists of two submodels, a global attention model and a local segmentation model. The global model generates a region of interests (ROI) that the object may lie in the new frame based on the past object segmentation maps; while the local model ...
متن کاملOcclusion-aware 3D Morphable Face Models
We propose a probabilistic occlusion-aware 3D Morphable Face Model adaptation framework for face image analysis based on the Analysis-by-Synthesis setup. In natural images, parts of the face are often occluded by a variety of objects. Such occlusions are a challenge for face model adaptation. We propose to segment the image into face and non-face regions and model them separately. The segmentat...
متن کاملJoint 3D Proposal Generation and Object Detection from View Aggregation
We present AVOD, an Aggregate View Object Detection network for autonomous driving scenarios. The proposed neural network architecture uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network. The proposed RPN uses a novel architecture capable of performing multimodal feature fusion to gen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE robotics and automation letters
سال: 2023
ISSN: ['2377-3766']
DOI: https://doi.org/10.1109/lra.2023.3238137